Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed bypass_cache directive in Query #473

Merged
merged 2 commits into from
Feb 29, 2024
Merged

Conversation

edeutsch
Copy link
Collaborator

@edeutsch edeutsch commented Jan 4, 2024

No description provided.

@cbizon
Copy link
Contributor

cbizon commented Feb 14, 2024

Just following up from the 2/13 Architecture call - is there a definition of how much caching needs to be bypassed to comply? Just to muddy the waters with some examples -

ARAGORN caches KP edges, and then on top of that caches full responses. In a bypass-cache scenario is the goal to bypass both kinds?

Unsecret "caches" in the sense that they pre-build a KG based on large inputs that they periodically receive. Since there is no way to rebuild that on the fly, any downstream caching doesn't really matter since the source data won't change.

I think ARAX pre-caches MVP1 results in a way that can't really be regenerated on the fly, how should it respond? (Please correct if wrong)

So I guess maybe it's going to be bit fuzzy, but is the idea something like "bypass all caching as much as possible?"

@edeutsch
Copy link
Collaborator Author

I agree that this is worth documenting carefully.
You are right that for the MVP1 query, ARAX consults some pre-computed results from a ML model (among other sources) that cannot be done on the fly.

So I agree that it would be a bit fuzzy, I would state it as "bypass as much caching as is feasible"

How about this for a succinct statement:

When a client provides the bypass_cache=true option to an agent, the agent MUST request fresh information from KPs in all cases where it has a viable choice between requesting fresh data and using cached information.

here "viable choice" is the finessed phrase. I suppose I would define this as: does the agent have code in place that could either request fresh data or used cached data. If there is no code in place to request fresh data in real time, then there is no viable choice.

comments?

@edeutsch
Copy link
Collaborator Author

I have refined the definition of bypass_cache based on conversation to:

        Set to true in order to request that the agent obtain
        fresh information from its sources in all cases where
        it has a viable choice between requesting fresh information
        in real time and using cached information.

Please approve, or suggest further refinements, or comment with problems you have with this, or at least be prepared to come to a resolution at tomorrow's TRAPI call.

@cbizon
Copy link
Contributor

cbizon commented Feb 14, 2024

I like this refinement FWIW

Copy link
Contributor

@brettasmi brettasmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving, but I think the standard way to handle this is a Cache-Control: no-cache header, which tells the caching server to validate the cached response as "fresh" before responding. Ref. I'm not sure this behavior maps exactly to what Translator is doing, but it's standard for bypassing caches.

Also note that allowing users to explicitly bypass the cache could be used maliciously overload the server, so it's advised that services implementing such behavior also implement some amount of rate limiting, etc.

@edeutsch edeutsch merged commit 0c5f1fb into 1.5 Feb 29, 2024
@edeutsch edeutsch deleted the edeutsch-bypass_cache branch May 28, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants